PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces
نویسندگان
چکیده
Trial-based asynchronous value iteration algorithms for large Partially Observable Markov Decision Processes (POMDPs), such as HSVI2, FSVI and SARSOP, have made impressive progress in the past decade. In the forward exploration phase of these algorithms, only the outcome that has the highest potential impact is searched. This paper provides a novel approach, called Palm LEAf SEarch (PLEASE), which allows the selection of more than one outcome when their potential impacts are close to the highest one. Compared with existing trial-based algorithms, PLEASE can save considerable time to propagate the bound improvements of beliefs in deep levels of the search tree to the root belief because of fewer point-based value backups. Experiments show that PLEASE scales up SARSOP, one of the fastest algorithms, by orders of magnitude on some POMDP tasks with large observation spaces.
منابع مشابه
Scalable Planning and Learning for Multiagent POMDPs
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable appr...
متن کاملScalable Planning and Learning for Multiagent POMDPs: Extended Version
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable appr...
متن کاملPEGASUS: A policy search method for large MDPs and POMDPs
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an “equivalent” POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the...
متن کاملEfficient Planning and Tracking in POMDPs with Large Observation Spaces
Planning in partially observable MDPs is computationally limited by the size of the state, action and observation spaces. While many techniques have been proposed to deal with large state and action spaces, the question of automatically finding good low-dimensional observation spaces has not been explored as thoroughly. We show that two different reduction algorithms, one based on clustering an...
متن کاملSolving POMDPs with Continuous or Large Discrete Observation Spaces
We describe methods to solve partially observable Markov decision processes (POMDPs) with continuous or large discrete observation spaces. Realistic problems often have rich observation spaces, posing significant problems for standard POMDP algorithms that require explicit enumeration of the observations. This problem is usually approached by imposing an a priori discretisation on the observati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015